In this exercise, we will be using functions from the
tidyverse package. You can see we’ve added the chunk option
message = FALSE to hide the version information that
tidyverse normally displays.
library(tidyverse)
Pick one of the plots you’ve made so far in exercise 1.3 or 2.3.
Try changing
fig.width,fig.heightanddpiin the code chunk options and see what happens.Can you use these to make a plot with very small text? A plot with very large text?
icecore <- read_csv("icecore.csv")
ggplot(icecore, aes(x = air_age_before_2008, y = CO2_ppm, colour = core)) +
geom_point()
ggplot(icecore, aes(x = air_age_before_2008, y = CO2_ppm, colour = core)) +
geom_point()
Copy and paste the code you wrote to make a plot for part (a), and then save the plot to a PNG file using
ggsave().
ggplot(icecore, aes(x = air_age_before_2008, y = CO2_ppm, colour = core)) +
geom_point()
ggsave("Ex_3_2_b.png", width = 4, height = 3, dpi = 600)
The file
ipf_lifts_raw.csvcontains the results from a large number of International Powerlifting Federation (IPF) meets. This data was sourced from Open Powerlifting, via Tidy Tuesday 2019-10-08. This is a further subset of the Tidy Tuesday data, containing only “raw” powerlifting competitions (no equipment such as wraps or straps allowed) and competitors whose age was known and under 80 years old.Powerlifting competitions are judged on each lifter’s “total”, which is the sum of the weight lifted on three lifts: the squat, the bench press and the deadlift.
In this exercise, we will investigate the relationship between powerlifting total (in variable
total_lifted_kg) and age (in variableage_class) for each gender (in variablesex).
ipf_lifts_raw <- read_csv("ipf_lifts_raw.csv")
- Make a boxplot, facetted by sex.
ipf_lifts_raw %>%
drop_na(age_class) %>%
ggplot(aes(y = age_class, x = total_lifted_kg)) +
geom_boxplot() +
facet_wrap(vars(sex), ncol = 2) +
scale_y_discrete(limits = rev)
- Plot means and error bars showing 95% confidence intervals using
stat_summary, also facetted by sex.Do these plots tell a different story?
ipf_lifts_raw %>%
drop_na(age_class) %>%
ggplot(aes(y = age_class, x = total_lifted_kg)) +
stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.5) +
stat_summary(fun.data = "mean_cl_normal", geom = "point") +
facet_wrap(vars(sex), ncol = 2) +
scale_y_discrete(limits = rev) +
labs(caption = "Error bars show 95% confidence intervals for the mean.")
The code below makes a plot for a subset of the ice core data you saw in exercise 1.3.
Modify it to include direct annotations of the three ice cores (DSS, DE08, DE08-2) instead of a legend.
Hint: it is probably easiest to create a data frame for the annotations, using
tribble().
icecore <- read_csv("icecore.csv")
Manually specifying locations for text:
icecore_text <- tribble(
~core, ~air_age_AD, ~CO2_ppm, ~hjust, ~vjust,
"DE08-2", 1960, 330, 1, 0.5,
"DE08", 1860, 295, 1, 0.5,
"DSS", 1200, 290, 0.5, 1
)
icecore %>%
filter(core != "Vostok") %>%
ggplot(aes(x = air_age_AD, y = CO2_ppm, colour = core)) +
geom_point() +
geom_text(data = icecore_text,
aes(label = core, hjust = hjust, vjust = vjust)) +
labs(x = "Air age (year A.D.)",
y = "CO2 concentration (ppm)") +
theme(legend.position = "off")
Calculating locations for text using code:
icecore_text <- icecore %>%
filter(core != "Vostok") %>%
group_by(core) %>%
summarise(air_age_AD = max(air_age_AD),
CO2_ppm = max(CO2_ppm)) %>%
ungroup()
icecore %>%
filter(core != "Vostok") %>%
ggplot(aes(x = air_age_AD, y = CO2_ppm, colour = core)) +
geom_point() +
geom_text(data = icecore_text,
aes(label = core),
hjust = 1, vjust = 1, nudge_x = -30, nudge_y = -2) +
labs(x = "Air age (year A.D.)",
y = "CO2 concentration (ppm)") +
theme(legend.position = "off")
The questions below relate to the powerlifting data, continuing on from question (c).
Extension: Make another plot showing means plus or minus two standard deviations. Is this closer to the boxplot or closer to the 95% confidence intervals? (Roughly what percentage of a normal distribution would you expect to be within two standard deviations of the mean?)
ipf_lifts_raw %>%
drop_na(age_class) %>%
ggplot(aes(y = age_class, x = total_lifted_kg)) +
stat_summary(fun.min = ~ mean(.) - 2 * sd(.),
fun.max = ~ mean(.) + 2 * sd(.),
geom = "errorbar", width = 0.5) +
stat_summary(fun = ~ mean(.), geom = "point") +
facet_wrap(vars(sex), ncol = 2) +
scale_y_discrete(limits = rev) +
labs(caption = "Error bars show two standard deviations from the mean.")
Extension: Make one of these plots for
weight_class_kginstead ofage_class. What went wrong here? (It’s okay if you can’t fix it yet!) What do you think is happening with the weight classes for men?
ipf_lifts_raw %>%
drop_na(age_class) %>%
ggplot(aes(y = fct_inseq(weight_class_kg), x = total_lifted_kg)) +
geom_boxplot() +
facet_wrap(vars(sex), ncol = 2, scales = "free_y") +
scale_y_discrete(limits = rev)
© 2021 Statistical Consulting Centre, The University of Melbourne.